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SPECIFICATION 
TITLE 

SPEAKER-DEPENDENT ¥OI€E -SPEECH RECOGNITION METHOD AND 
YOiC ESPEECH RECOGNITION SYSTEM 

CROSS-REFERENCE TO RELATED APPLICATIONS 

[0001] This application is a U.S. national stage application of International 
Application No. PCT/EP2004/002137 filed March 3, 2004, which designates the United 
States of America, and claims priority to German application number 103 13 310.0 filed 
March 25, 2003, the contents of which are hereby incorporated by reference in their 
entirety. 

FIELD OF TECHNOLOGY 

[0002] The present disclosure relates to a speaker-dependent voiee speech recognition 
method with a voiee speech recognition system, in which voice utterances of a user are 
trained and commands are assigned to the trained voice utterances, and to a voie espeech 
recognition system for carrying out the method. 

BACKGROUND 

[0003] According to the prior art, such a method is divided into a voiee speech 
recognition mode and a training mode. In the voie espeech recognition mode, voice utterances 
of the user are detected whereupon a command assigned to the voice utterance is found in a 
database if the voice utterance exhibits sufficient correspondence with a voice utterance 
which belongs to the command and was recorded and stored at an earlier time. In the 
veie espeech recognition mode, a new assignment between a new voice utterance and a new 
command is not possible. Instead, these processes take place in the training mode in which 
the user utters voice utterances and assigns a command to each individual voice utterance 
after it has been recorded. The assignment obtained is stored in the database. Assigned 
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commands can be, for example, dialing processes for subscribers to a communication 
network or voice control commands. 

[0004] The method according to the prior art has the disadvantage that the training of 
new commands is complicated inasmuch as it is necessary for the user to actively switch 
from the veiee speech recognition mode to the training mode every time. This also has a 
negative effect on the market acceptance of veiee speech recognition systems. 

[0005] On the basis of this, the present disclosure seeks to specify a speaker- 
dependent veiee speech recognition method and a veie espeech recognition system for this in 
which new commands can be trained in a time-saving manner. 

SUMMARY 

[0006] The present disclosure achieves this with regard to the method of the type 
initially mentioned in that upon non-recognition of a veiee speech utterance, the veiee speech 
recognition system provides the user with the opportunity to immediately assign the voice 
utterance to a new command. 

[0007] When carrying out the inventive method, a veiee speech recognition system is 
always in the veie espeech recognition mode, but the option is available immediately to 
perform a new command assignment upon non-recognition of a voice utterance. In this 
manner, the training of new commands is integrated into the veiee speech recognition itself 
and can take place when a voice utterance has not been recognized. If, for example, the user 
happens to be in the situation where he/she wishes to train a new command for a veiee speech 
recognition system, it is sufficient to articulate a voice utterance which has not yet been used, 
whereafter the veiee speech recognition system finds a non-recognition of the new voice 
utterance and then offers the option of assigning the voice utterance to a new command. After 
the assignment has been performed, the command can be executed immediately. 

[0008] In a preferred embodiment of the present disclosure, upon non-recognition of 
the voice utterance by the veiee speech recognition system the user optionally may either 
repeat the voice utterance or assign a new command to the voice utterance. This embodiment 
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takes into account that a voice utterance can be just outside the range of similarity to a voice 
utterance to which a desired command has already been assigned. In this case, it is not 
intended to assign a new voice utterance to a new command. Instead, this voice utterance 
must be repeated in order to be linked to the already trained command. 

[0009] Having regard to an initial state of a veiee speech recognition system, it is 
preferably provided for the method that in the case when no command has yet been assigned 
to any voice utterance, the vete espeech recognition system, after having been activated, 
immediately offers the training of a new command. This happens automatically when the 
vofee speech recognition system naturally does not recognize the first voice utterance and 
offers the option of training a new command. 

[0010] In another embodiment of the present disclosure, it can be provided that, upon 
non-recognition of a voice utterance for a command already trained by the veiee speech 
recognition system, the user can select the command and assign the voice utterance to this 
command. This refers to the case where a "poor" version of the voice utterance is present in a 
database which contains the assignments between voice utterances and associated trained 
commands, so that a veree speech recognition frequently fails. It is possible in this case to 
assign a new voice utterance to the command already trained. 

[001 1] For recognition of a voice utterance, a voice pattern is preferably generated 
which is assigned to the voice utterance. Such voice patterns, which are based on an 
extraction of essential voice features of the voice utterance, are also then used in the database 
which in this case contains an assignment between voice patterns and trained commands. 
After having been recorded, each voice utterance is converted into a voice pattern which is 
then processed further, such as for the decision whether it is recognizable or not; i.e., is 
already present within a range of similarity of a voice pattern in the database. 

[0012] In this connection, it is preferable to check before a command is assigned to a 
voice utterance whether a voice utterance is similar to previously stored voice utterances 
before a command is assigned to a voice utterance. This prevents confusion among different 
commands from occurring during voiee speech recognition because the associated voice 
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utterances are in each case too similar to one another. For this purpose, a permissible range of 
similarity can be defined; for example, by using the extraction features for a voice pattern. 

[0013] The abovementioned method is achieved with regard to a voree speech 
recognition system by a voiee speech recognition system for a speaker-dependent recognition 
of voice including a voice recording device for recording a voice utterance of a user of the 
voiee speech recognition system, a search engine which is designed for accessing a database 
which contains an assignment between voice utterances and commands in order to find a 
command assigned to the voice utterance, and a conversion device for converting the 
command found due to the voice utterance, the voiee speech recognition system being 
designed in such a manner that upon non-recognition of the voice utterance, the voiee speech 
recognition system provides the user with the opportunity to immediately assign the voice 
utterance to a new command. 

[0014] Such a vore espeech recognition system allows the method described above to 
be carried out and, compared with known wiee speech recognition systems, is distinguished 
by the fact that the training of new commands is made possible in a voree speeeh recognition 
mode. 

[0015] The voice recording device is preferably connected to a memory in which the 
voice utterance is temporarily stored and which is connected to the database for reading the 
voice utterance into the database. This is not the case in known voree speech recognition 
systems because in these, the database is directly accessed for a training mode whereas in a 
voie espeech recognition mode, although a voice utterance is temporarily stored for the 
operation of the search engine, the memory then used is not designed/linked for reading a 
voice utterance into the database. 

[0016] Preferably, a feature extraction device for generating a voice pattern from the 
voice utterance is provided between the voice recording device and the memory and the voice 
pattern replaces the voice utterance. 

[0017] Additional features and advantages of the present disclosure are described in, 
and will be apparent from, the following Detailed Description and the Figures. 
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BRIEF DESCRIPTION OF THE FIGURES 

[001 8] Figure 1 shows a flow chart of a speaker-dependent veiee speech recognition 
method in connection with the teachings of the present disclosure. 



DETAILED DESCRIPTION 

[0019] A speaker-dependent voree speech recognition method via a voiee speech 
recognition system will now be explained with reference to Figure 1 . After a start of the 
voie espeech recognition system, which is implemented, for example, as a computer system 
with a display device, a suitable user interface which also contains an activation for a 
recording of a voice utterance ("push-to-talk" activation) is first displayed to the user. In a 
first method step 1, a voice utterance of the user/speaker is recorded with the aid of a suitable 
voice recording device. In a second step 2, a voice pattern of the voice utterance is generated 
via a feature extraction device, a voice pattern being defined by a combination of extracted 
characteristic voice features. The voice pattern is temporarily stored in a memory. 

[0020] In a third step 3, a search engine is used to interrogate whether the voice 
pattern generated is contained in a database which contains assignments between voice 
patterns and commands. This database is provided with contents in a training mode of the 
vete espeech recognition system, the training mode being integrated into the process of a 
veie espeech recognition. If the voice pattern is recognized as already present in the database 
and the associated command is found, the command is executed in a fourth step, after which 
the operating process of the veiee speech recognition system is ended. The sequence from 
step 1 to step 4 is automatic in the present illustrated embodiment. 

[0021] If the voice pattern generated is not recognized in the third step 3, the user 
receives the option of assigning a new command to the unrecognized voice pattern or the 
unrecognized voice utterance, respectively, via the user interface of the computer system. 
This takes place in a fifth step 5 of the method. At this point, the veiee speech recognition 
system is switched into a training mode if the assignment of a new command is desired or 
automatically performed. As an alternative to the fifth step 5, the user can also trigger a new 
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voice utterance recording with the aid of the user interface so that the process returns to the 
first step 1 in order to repeat the voice utterance. 

[0022] If the assignment of a new command to the unrecognized voice pattern is 
selected, a voice utterance corresponding to the unrecognized voice utterance from the first 
step is recorded in a sixth step 6. Following this, a voice pattern is generated from the voice 
utterance recorded in the sixth step 6 in a seventh step 7, in the same manner as in the second 
step 2 explained above. 

[0023] In an eighth step 8, a similarity check between the new voice pattern from the 
seventh step 7 and the voice pattern from the second step 2 is performed. If a desired degree 
of correspondence between the two voice patterns is not obtained, the method begins again 
until a satisfactory result for the similarity of the voice patterns generated in the second step 2 
and the seventh step 7 is obtained. During this process, the third step 3 and the fifth step 5 can 
be skipped. 

[0024] In the eighth step 8, a similarity check also may be performed to see whether 
the voice pattern of the newly recorded voice utterance is sufficiently distinct compared with 
the voice patterns already present in the database. If not, the user can be requested to use a 
different voice utterance for assignment for a new command. The method recommences with 
this new voice utterance. 

[0025] Following this, a command is assigned to the voice pattern generated in the 
second step 2 in a ninth step 9 by a suitable selection of the user with the aid of the user 
interface of the veiee speech recognition system. For this purpose, the voice pattern is read 
from the memory in which it was temporarily stored in the second step 2, suitably combined 
with the voice pattern generated in step 7; e.g., by averaging individual characteristics of both 
voice patterns and written into the database together with the new command. 

[0026] In a final step 10, the newly assigned command is executed after which the 
voiee speech recognition process with integrated training mode is concluded. 

[0027] It must be emphasized that the execution of a command taking place in the 
fourth and last step takes place with the aid of a conversion device for converting the 
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command. The command can be, for example the dialing of a telephone number in a 
communication network or a voice command via which devices connected to a network are 
controlled. 

[0028] Naturally, in a simplified embodiment of the method, the performance of the 
preceding steps 6 to 8 can be omitted when a command is assigned according to the ninth 
step 9. In this manner, a command is assigned immediately following the interrogation from 
the fifth step 5. It is also possible to dispense with the immediate execution of the newly 
trained command (tenth step) during the performance of the method. 

[0029] Although the present disclosure has been described with reference to specific 
embodiments, those of skill in the art will recognize that changes may be made thereto 
without departing from the spirit and scope of the present disclosure as set forth in the 
hereafter appended claims. 
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ABSTRACT OF THE DISCLOSURE 

A speaker-dependent veiee speech recognition method is provided involving the use 
of a veiee speech recognition system, during which voice utterances of the user are trained, 
and commands are assigned to the trained voice utterances. The present disclosure seeks to 
carry out a training of new commands in a time-saving manner. To this end, in the event of a 
non-recognition of a voice utterance, the veie espeech recognition system provides the user 
with the opportunity to immediately assign the voice utterance to a new command. 
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